The Importance of Knowing When to Stop

A. Mayr; B. Hofner; M. Schmid

doi:10.3414/ME11-02-0030

Methods of Information in Medicine, Table of Contents

Methods Inf Med 2012; 51(02): 178-186
DOI: 10.3414/ME11-02-0030

Focus Theme – Original Articles

Schattauer GmbH

The Importance of Knowing When to Stop

A Sequential Stopping Rule for Component-wise Gradient Boosting

A. Mayr

¹Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany

,

B. Hofner

¹Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany

,

M. Schmid

¹Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany

› Author Affiliations

Abstract

Summary

Objectives: Component-wise boosting algorithms have evolved into a popular estimation scheme in biomedical regression settings. The iteration number of these algorithms is the most important tuning parameter to optimize their performance. To date, no fully automated strategy for determining the optimal stopping iteration of boosting algorithms has been proposed.

Methods: We propose a fully data-driven sequential stopping rule for boosting algorithms. It combines resampling methods with a modified version of an earlier stopping approach that depends on AIC-based information criteria. The new “subsampling after AIC” stopping rule is applied to component-wise gradient boosting algorithms.

Results: The newly developed sequential stopping rule outperformed earlier approaches if applied to both simulated and real data. Specifically, it improved purely AIC-based methods when used for the microarray-based prediction of the recurrence of meta-stases for stage II colon cancer patients.

Conclusions: The proposed sequential stopping rule for boosting algorithms can help to identify the optimal stopping iteration already during the fitting process of the algorithm, at least for the most common loss functions.

Keywords

Gradient boosting - resampling methods - early stopping - variable selection - penalized regression

Full Text

References

References
1 Chang YCI, Huang YF, Huang YP. Early stopping in L 02 Boosting. Computational Statistics & Data Analysis 2010; 54 (010) 2203-2213.
2 Barrier A, Boelle PY, Roser F, Gregg J, Tse C, Brault D. et al Stage II colon cancer prognosis prediction by tumor gene expression profiling. J Clin Oncol 2006; 24 (029) 4685-4691.
3 Freund Y, Schapire R. editors. Experiments with a new boosting algorithm. Proceedings of the Thirteenth International Conference on Machine Learning (ICML) 1996
4 Friedman JH. Greedy function approximation: A gradient boosting machine. Annals of Statistics 2001; 29 (05) 1189-1232.
5 Bühlmann P, Yu B. Boosting with the L-2 loss: Regression and classification. Journal of the American Statistical Association 2003; 98 0462 324-339.
6 Bühlmann P, Hothorn T. Boosting algorithms: Regularization, prediction and model fitting. Statistical Science 2007; 22 (04) 477-505.
7 Mease D, Wyner A. Evidence contrary to the statistical view of boosting. Journal of Machine Learning Research 2008; 9: 131-156.
8 Tibshirani R. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society Series B-Methodological 1996; 58 (01) 267-288.
9 Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Annals of Statistics 2004; 32 (02) 407-451.
10 Efron B. Biased Versus Unbiased Estimation. Advances in Mathematics 1975; 16 (03) 259-277.
11 Copas JB. Regression, Prediction and Shrinkage. Journal of the Royal Statistical Society Series B-Methodological 1983; 45 (03) 311-354.
12 Hastie T, Tibshirani R, Friedman JH. The Elements of Statistical Learning. Springer 2003
13 Wyatt JC, Altman DG. Prognostic Models - Clinically Useful or Quickly Forgotten - Commentary. British Medical Journal 1995; 311 07019 1539-1541.
14 Andreasen NC, Wilcox MA, Ho BC, Epping E, Ziebell S, Zeien E. et al. Statistical epistasis and progressive brain change in schizophrenia: an approach for examining the relationships between multiple genes. Mol Psychiatry 2011
15 Schild RL, Maringa M, Siemer J, Meurer B, Hart N, Goecke TW. et al Weight estimation by three-dimensional ultrasound imaging in the small fetus. Ultrasound Obstet Gynecol 2008; 32 (02) 168-175.
16 Stollhoff R, Sauerbrei W, Schumacher M. An experimental evaluation of boosting methods for classification. Methods Inf Med 2010; 49 (03) 219-229.
17 Mayr A, Hothorn T, Fenske N. Prediction intervals for future BMI values of individual children - a non-parametric approach by quantile boosting. BMC Med Res Methodol 2012; 12: 06
18 Fenske N, Kneib T, Hothorn T. Identifying Risk Factors for Severe Childhood Malnutrition by Boosting Additive Quantile Regression. Journal of the American Statistical Association 2011; 106 0494 494-510.
19 Akaike H. New Look at Statistical-Model Identification. Ieee Transactions on Automatic Control 1974; Ac 19 (06) 716-23.
20 Hastie TJ. Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting. Statistical Science 2007; 22 (04) 513-515.
21 Greven S, Kneib T. On the behaviour of marginal and conditional AIC in linear mixed models. Biometrika 2010; 97 (04) 773-789.
22 Binder H, Schumacher M. Adapting prediction error estimates for biased complexity selection in high-dimensional bootstrap samples. Stat Appl Genet Mol Biol 2008 7. 01 Article 12.
23 Sauerbrei W, Boulesteix AL, Binder H. Stability investigations of multivariable regression models derived from low- and high-dimensional data. J Biopharm Stat 2011; 21 (06) 1206-1231.
24 R Development Core Team A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0 2011
25 Hothorn T, Bühlmann P, Kneib T, Schmid M, Hofner B. Model-based Boosting 2.0. Journal of Machine Learning Research 2010; 11: 2109-2113.
26 Bolstad B, Irizarry R, Gautier L, Wu Z. Preprocessing High-density Oligonucleotide Arrays Bioinformatics and Computational Biology Solutions Using R and Bioconductor. In Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit S. editors. New York: Springer; 2005. pp 13-32.
27 Schimek MG. Penalized binary regression for gene expression profiling. Methods Inf Med 2004; 43 (05) 439-444.
28 West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R. et al Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA 2001; 98 (020) 11462-11467.
29 Dettling M, Buhlmann P. Boosting for tumor classification with gene expression data. Bioinformatics 2003; 19 (09) 1061-1069.
30 Mosteller F, Tukey JW. Data analysis, including statistics. In Lindzey G, Aronson E. editors Handbook of Social Psychology: Addison-Wesley. 1968